Picture for Nicolas Ballas

Nicolas Ballas

Inference-time Physics Alignment of Video Generative Models with Latent World Models

Add code
Jan 15, 2026
Viaarxiv icon

Learning Latent Action World Models In The Wild

Add code
Jan 08, 2026
Viaarxiv icon

Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors

Add code
Sep 16, 2025
Figure 1 for Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
Figure 2 for Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
Figure 3 for Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
Figure 4 for Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
Viaarxiv icon

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Add code
Jun 11, 2025
Viaarxiv icon

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

Add code
Jun 11, 2025
Figure 1 for A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Figure 2 for A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Figure 3 for A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Figure 4 for A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Viaarxiv icon

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Add code
Apr 19, 2025
Viaarxiv icon

Scaling Language-Free Visual Representation Learning

Add code
Apr 01, 2025
Figure 1 for Scaling Language-Free Visual Representation Learning
Figure 2 for Scaling Language-Free Visual Representation Learning
Figure 3 for Scaling Language-Free Visual Representation Learning
Figure 4 for Scaling Language-Free Visual Representation Learning
Viaarxiv icon

Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Add code
Feb 17, 2025
Viaarxiv icon

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

Add code
Oct 04, 2024
Figure 1 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 2 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 3 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 4 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Viaarxiv icon

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Add code
Apr 30, 2024
Figure 1 for Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Figure 2 for Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Figure 3 for Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Figure 4 for Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Viaarxiv icon